Deep Learning for Lung Cancer Diagnosis, Prognosis and Prediction Using Histological and Cytological Images: A Systematic Review

Simple Summary Lung cancer is one of the most common and deadly malignancies worldwide. Microscopic examination of histological and cytological lung specimens can be a challenging and time-consuming process. Most of the time, accurate diagnosis and classification require histochemical and specific immunohistochemical staining. Currently, Artificial Intelligence-based methods show remarkable advances and potential in Pathology and can guide lung cancer diagnosis, subtyping, prognosis prediction, mutational status characterization, and PD-L1 expression estimation, performing with high accuracy rates. This systematic review aims to provide an overview of the current advances in Deep Learning-based methods on lung cancer by using histological and cytological images. Abstract Lung cancer is one of the deadliest cancers worldwide, with a high incidence rate, especially in tobacco smokers. Lung cancer accurate diagnosis is based on distinct histological patterns combined with molecular data for personalized treatment. Precise lung cancer classification from a single H&E slide can be challenging for a pathologist, requiring most of the time additional histochemical and special immunohistochemical stains for the final pathology report. According to WHO, small biopsy and cytology specimens are the available materials for about 70% of lung cancer patients with advanced-stage unresectable disease. Thus, the limited available diagnostic material necessitates its optimal management and processing for the completion of diagnosis and predictive testing according to the published guidelines. During the new era of Digital Pathology, Deep Learning offers the potential for lung cancer interpretation to assist pathologists’ routine practice. Herein, we systematically review the current Artificial Intelligence-based approaches using histological and cytological images of lung cancer. Most of the published literature centered on the distinction between lung adenocarcinoma, lung squamous cell carcinoma, and small cell lung carcinoma, reflecting the realistic pathologist’s routine. Furthermore, several studies developed algorithms for lung adenocarcinoma predominant architectural pattern determination, prognosis prediction, mutational status characterization, and PD-L1 expression status estimation.


Introduction
Lung cancer is one of the most prevalent cancers worldwide, characterized by a high mortality rate, reaching up to 18% of total cancer-related deaths, with cigarette smoking being the leading cause [1]. Lung cancer is a heterogeneous disease, mainly classified as non-small cell lung carcinoma (NSCLC) and small cell lung carcinoma (SCLC) [2]. NSCLC constitutes the majority of lung cancer cases (85%) and is further classified into adenocarcinoma (ADC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC), while the remaining 15% accounts for SCLC, which is characterized by neuroendocrine differentiation.
In the era of personalized medicine, lung cancer diagnosis and accurate classification strongly rely on cytological and histological subtyping by microscopic evaluation with standard histochemical stains and ancillary immunohistochemical staining [3]. Molecular testing is also necessary for personalized therapeutic targeting and monitoring for patients' stratification to targeted therapy and immunotherapy [4,5]. According to published guidelines by the College of American Pathologists, the International Association for the Study of Lung Cancer, and the Association for Molecular Pathology, patients with advanced lung cancer with an ADC component should be tested for epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) and ROSproto oncogene 1 (ROS-1) rearrangements, v-Raf murine sarcoma viral oncogene homolog B (BRAF) Val600Glu (BRAFV600E), Ret Proto-Oncogene (RET) rearrangements, mesenchymal-epithelial transition (MET) exon 14 skipping mutations, Kirsten rat sarcoma (KRAS) mutations, and neurotrophic tyrosine kinase receptor fusions (NTRK1-3) [2, 6]. Advanced-stage nonneuroendocrine carcinomas should be tested for programmed cell death ligand 1 (PD-L1) expression status as patients with a PD-L1 Tumor Proportion Score (TPS) ≥ 50% are eligible for first-line treatment with the anti-PD-L1 therapy, pembrolizumab. Immunohistochemical assays are available for PD-L1 and ALK expression status detection [7][8][9][10]. Currently, reflexordered testing for lung cancer is gaining ground, underlining the necessity of collaboration between pathologists and oncologists. Although reflex testing is not feasible to perform in many laboratories, it can provide additional valuable information, detect rare molecular alterations, and minimize testing turnaround time [3,11].
In the last decade, Deep learning (DL) approaches, including mostly Convolutional Neural Networks (CNNs), are increasingly valuable in Pathology. Limitations concerning the shortage of pathologists worldwide, subjectivity in diagnosis, and intra-and interobserver variability could be overcome with the aid of DL models. Recent advances in lung cancer pathology leverage image analysis potential for cancer diagnosis from hematoxylin and eosin (H&E) whole slide images (WSIs) [12,13]. Considering that small biopsy and cytology specimens are the available material for 70% of lung cancer patients with advanced unresectable disease, DL methods could guide the diagnosis with high accuracy, minimizing the need for additional special stains required for differential diagnosis and preserving the already limited material for molecular testing [2, 14,15].
In this review, we systematically outline the current implications of DL algorithms for lung cancer diagnosis, prognosis, and prediction using both histological and cytological images. We further summarized the extracted data into distinct categories based on the classification problem, presenting for each study the dataset details, the employed technical method and methodology, as well as the performance metrics. The different categories have been structured to be informative for both pathologists and cytologists, can provide a detailed analysis and a comprehensive guide of the existing DL applications for lung cancer, and offer valuable information to researchers for further study.

Materials and Methods
The systematic review followed the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [16]. The protocol has not been registered.

Diagnosis
Jain et al. used DL architectures for detecting lung cancer from histopathologica ages pre-processed for size, normalization, and noise removal [18]. Three datasets w included achieving high-performance rates with an accuracy of over 97%. Jiao et al. posed a rapid and efficient method for tumor classification called Deep Embedding-b Logistic Regression (DELR) [19]. DELR was applied in three different datasets (colore lung, and breast cancer) and achieved an area under the curve (AUC) of over 0.95 fo three datasets. In lung cancer, the dataset consisted of 338 regions of interest (ROIs) cluding ADC and SCC images. Moreover, Kanavati et al. trained a CNN to disting lung carcinoma from non-neoplastic tissue based on the EfficientNet-B3 architecture After training, the CNN was tested on four independent datasets and attained an AU more than 0.97, demonstrating its feasibility of generalization. Multiple Instance Lear (MIL) was employed for the same classification task without the need for manual ann tions by pathologists [21]. A multi-organ classification using weakly supervised lear was performed by Tsuneki et al. on transbronchial lung biopsy WSIs [22]. The AUC va of the three different balanced training datasets collected from medical institutions w 0.879-0.933 (Table 1).  Jain et al. used DL architectures for detecting lung cancer from histopathological images pre-processed for size, normalization, and noise removal [18]. Three datasets were included achieving high-performance rates with an accuracy of over 97%. Jiao et al. proposed a rapid and efficient method for tumor classification called Deep Embeddingbased Logistic Regression (DELR) [19]. DELR was applied in three different datasets (colorectal, lung, and breast cancer) and achieved an area under the curve (AUC) of over 0.95 for all three datasets. In lung cancer, the dataset consisted of 338 regions of interest (ROIs), including ADC and SCC images. Moreover, Kanavati et al. trained a CNN to distinguish lung carcinoma from non-neoplastic tissue based on the EfficientNet-B3 architecture [20]. After training, the CNN was tested on four independent datasets and attained an AUC of more than 0.97, demonstrating its feasibility of generalization. Multiple Instance Learning (MIL) was employed for the same classification task without the need for manual annotations by pathologists [21]. A multi-organ classification using weakly supervised learning was performed by Tsuneki et al. on transbronchial lung biopsy WSIs [22]. The AUC values of the three different balanced training datasets collected from medical institutions were 0.879-0.933 (Table 1).

Lung Cancer Classification
A common classification problem among all papers included refers to lung cancer tissue classification into the main categories of ADC, SCC, and SCLC according to WHO guidelines. Kanavati et al. developed a CNN for lung cancer subtyping (ADC, SCC, SCLC, and non-neoplastic tissue) trained on transbronchial biopsy (TBLB) images with mainly poorly differentiated carcinomas [24]. Their model was tested on four validation cohorts (one with TBLB specimens and three with surgical resections), performing with AUC over 0.9 on all datasets. The same classification problem was employed with weakly supervised CNN, including a smaller dataset from hospital archives and The Cancer Genome Atlas (TCGA) database [25]. The model had an overall accuracy of 97.3% and achieved an AUC of 0.856 in the TCGA cohort. In addition, three common CNNs (Inceptionv3, VGG-16, InceptionResNetV2) were used for lung cancer classification on TMAs. The InceptionV3 model achieved the highest performance; however, many cases of ADC and SCC were misclassified [26]. In a retrospective study by Yang et al., a six-type classifier model was designed for lung cancer (ADC, SCC, SCLC) as well as other lung diseases (pulmonary tuberculosis, organizing pneumonia) subtyping on H&E-stained slides [27]. The proposed classification task achieved great performance and consistency with experienced pathologists. In a different study, Yang et al. introduced a CNN for subtyping lung cancer in five classes, namely ADC, SCC, SCLC, large cell neuroendocrine carcinoma (LCNEC), and nontumor [28]. The customized model performed similarly or better than the pre-trained ones, although existing limitations of the study, such as the use of patches instead of WSIs and the limited dataset, resulted in moderate classification accuracies. Likewise, Kosaraju et al. applied a novel DL framework for classifying ADC, SCC, SCLC, and LCNEC, achieving an AUC of 0.96 [29]. The studies of Yang and Kosaraju et al. were the only ones that included LCNEC in the classifiers representing the realistic diagnostic practice for a pathologist. Ilié et al. applied a DL algorithm for distinguishing SCLC, LCNEC, and atypical carcinoid (AC) [30]. A number of 150 H&E WSIs were included, and the model was in great agreement when compared to expert and general pathologists, achieving an AUC of 0.93. Lastly, in their recent study, Chen et al. proposed an immunohistochemical phenotype prediction system for upgrading the classification of lung cancer into ADC, SCC, and SCLC [31]. The WSI-based Immunohistochemical Feature Prediction System (WIFPS) discriminated lung cancer types on H&E slides based on the positive or negative expression scoring of characteristic biomarkers for each class (ADC: TTF-1, CK7, and Napsin-A; SCC: CK5/6, p40, and p63; SCLC: CD56, Synaptophysin, Chromogranin A, and TTF-1). The agreement between the WIFPS model and pathologists achieved high to almost perfect consistency (Cohen's kappa value of 0.7903-0.8891) in validation sets and the AUC in surgical and biopsy images was over 0.8 in all validation cohorts. In addition, ALK prediction status achieved an AUC of 0.917; however, programmed cell death protein 1 (PD-1), PD-L1, KRAS, and EGFR status did not reach high performance (Table 2).

NSCLC Subtypes Classification
The diagnosis between ADC and SCC from a single H&E slide from a small biopsy or cytological material can be challenging. Thus, for precise diagnosis, additional staining for immunohistochemical biomarkers, such as TTF-1, CK5/6, CK7, pan keratin, p40, and p63, and histochemical stains, such as periodic acid-Schiff (PAS), must be performed. Several studies have addressed binary classification problems concerning NSCLC subtyping from H&E slides for an accurate and fast diagnosis. The majority of these mainly include ADC and SCC WSIs, mostly from the TCGA dataset, whereas the classification task is performed by a CNN or a combination of the state-of-the-art CNN architectures with varying approaches and techniques [37,47,[51][52][53][54][55][56]58,59]. Moreover, NSCLC subtyping was combined with genomic data, namely copy number variations (CNVs), from TCGA [42]. The authors demonstrated that their proposed LungDIG model could be of great importance not only for ADC and SCC diagnosis but also for stratifying patients for targeted therapies, as the performance metrics of the model were higher when WSI and CNV data were combined compared to when WSI or CNV features were used alone. Zhao et al. developed a weakly supervised DL model to localize ROIs on WSIs (AUC of 0.9602) and then accurately subtype NSCLC into ADC and SCC with high sensitivity and specificity rates (0.9474 and 0.8583, respectively) [43]. In another study extracting prominent deep features (DFs) for each histopathological image, classification accuracy was better, and the authors identified 15 DFs with the ability to classify lung cancer with an accuracy of over 85% [44]. The generalizability of the model was feasible in distinguishing ADC from SCC on 21 non-pulmonary carcinomas; however, classification accuracy reached 56% in the external validation cohort. Hou et al. performed a classification task of NSCLC subtyping into ADC, SCC, and ADC with mixed subtypes [57]. Their proposed framework was trained and tested on a TCGA dataset with a classification accuracy of 0.798. Masud et al. designed a classification framework for diagnosing lung and colon cancer from histopathological images from the LC25000 dataset [49]. The model achieved a peak classification accuracy of 96.33%; however, the lung ADC class had a higher misclassification rate. The same problem using the LC25000 dataset was employed by other authors, with an overall accuracy of 99% [32,33,36,39]. DarkNet-19 model reached accuracies of 97.57%, 99.87%, and 97.73% in classifying ADC, benign, and SCC images, respectively, while the overall accuracy of the model was 99.69% [45]. Likewise, Civit-Masot et al. employed Explainable Artificial Intelligent (AI) Technologies [23]. Liu et al. used AI along with activation function for cancer infiltration screening on histopathological images [38]. Their method was further utilized for lung cancer classification (ADC and SCC) using the LC25000 database, presenting good generalization ability. In a more recent study, Liu et al. proposed a novel method for automated detection of lung ADC infiltration using 780 images with sensitivity and specificity of 93.10% and 96.43%, respectively [60]. Utilizing a combination of molecular and histological data (gene expression data and WSIs, respectively) as input for NSCLC classification, Carrillo Perez et al. demonstrated that the fusion model could provide robust information for decision-making to targeted therapies [46]. Wang et al. proposed a platform for the automated classification of NSCLC into ADC, SCC, and normal regions as well as for prediction of mutational status of 10 frequently mutated genes in ADC [50]. The model predicted with an AUC of 0.824 the EGFR mutational status on ADC H&E WSIs. Similarly, a model for NSCLC subtyping (ADC, SCC, normal regions) achieved an AUC of 0.97 [56].
The authors trained the model to predict the mutational status in lung ADC slides. Of the ten frequently mutated genes in ADC, STK11 and KRAS had the highest AUC (0.845 and 0.814, respectively). An annotation-free DL method for the subtyping of NSCLC slides achieved high performance for ADC and SCC (AUC of 0.9594 and 0.9414, respectively) and could be employed in clinical practice as it overcomes the time-consuming process of annotations and limitations concerning the capacity/memory of WSIs [48]. Wang et al. developed a DL model to perform cancer lesion region segmentation and histological subtype classification on ADC and SCC slides [40]. The model showed high classification performance metrics (accuracy was 100% and 95.1%, sensitivity was 95.0 and 100.0%, and specificity was 95.2 and 100.0% for SCC and ADC classification tasks, respectively). Classification of transcriptomic lung ADC (bronchioid, squamoid, and magnoid) and/or SCC (primitive, classic, secretory, and basal) subtypes was performed by Yu and Antonio et al. [53,61]. In the first study, classification was performed on both ADC and SCC, resulting in a significant correlation between the transcriptomic subtype and the histopathology classification scores and achieving AUC of 0.771-0.892 and approximately 0.7 for ADC and SCC, respectively, with the employment of four CNNs. In the study of Antonio et al., ADC transcriptome subtype classification resulted in a classification accuracy of 98.9%. Lastly, Le Page et al. tried to distinguish squamous from non-squamous lung carcinoma from initial cytology and small biopsy specimens [41]. Their model performed with good classification accuracy, while the accuracy was slightly increased in the external validation cohorts when tissue microarrays (TMAs) were selected (accuracy rates of 0.78 in biopsies versus 0.82 in TMAs). Finally, two recent studies performed a binary classification between ADC and SCC using over 900 WSIs from TCGA and achieving an AUC of over 0.90 [34,35] (Table 2).

Lung ADC Predominant Architectural Patterns Classification
ADC cases exhibit various histological patterns. According to the WHO, there are five distinct histological subtypes (lepidic, acinar, papillary, micropapillary, and solid) that must be included in a pathology report when the material is a resection specimen [2]. The detection of ADC predominant architectural patterns has been the scope of several research papers ( Table 3). The study by Sadhwani et al. performed a classification problem including six histological subtypes (acinar, lepidic, solid, papillary, micropapillary, cribriform) and then combined the predicted output with clinical data (smoking status, age, etc.) for tumor mutational burden (TMB) status prediction [62]. The AUC for ADC predominant architectural patterns classification was 0.93 and 0.92 for TCGA and the external validation cohort, respectively, while for the TMB status prediction, it was 0.71. Furthermore, a six-class problem (lepidic, acinar, papillary, micropapillary, solid, benign) for lung ADC histological subtypes classification in lung ADC WSIs was in moderate agreement with pathologists' estimations [63]. In a similar study, ADC histological patterns were classified into five categories (solid, micropapillary, acinar, cribriform, non-tumor) using three different CNN architectures [64]. The best classification accuracy was 89.24%, while, in the study of Di-Palma et al., the histological classification of the known five patterns of lung ADC resulted in a classification accuracy of 94.51% [65]. Xiao et al. created a novel framework combining CNNs and graph convolutional networks for quantitative estimation of histopathological growth patterns in lung ADC slides [66]. Another lung ADC subtyping problem was performed by Sheikh et al. achieving a high accuracy rate of 0.946 and outperforming the state-of-the-art models [67]. In a different study, Gao et al. collected slides from ADC with micropapillary patterns and performed a binary classification problem for detecting the presence of a micropapillary pattern in ADC slides [68]. Maleki et al. investigated how several possible methodological errors, such as oversampling and data augmentation, can lead to poor generalizability performance and performed a binary classification task for the distinction of solid and acinar predominant histologic subtypes in ADC H&E slides [69].

Prediction of Prognosis and Survival
The quantification and evaluation of the tumor microenvironment (TME) features from histopathological images, derived by the spatial distribution of different cell types (lymphocytes, stromal cells), the density of stromal cells, etc., provide valuable information not only for immune therapy response but also for the probability of survival [70]. TME plays an important role in immunotherapy response as well as in cancer progression and metastasis in lung cancer. Several studies have aimed to develop algorithms for TME characterization of lung cancer pathology images to predict response to targeted therapies and extract prognostic value. Barmpoutis et al. proposed a methodology to identify and quantify tertiary lymphoid structures (TLS) in lung cancer H&E images [71]. Segmentation of lymphocytes showed that their density within a TLS region was 3-fold higher than lymphocytes outside TLS regions. Their study had high sensitivity and specificity rates and could be used as a prognostic feature to predict response to immunotherapy. DeepRePath was proposed for prognosis prediction in patients with early-stage ADC [72]. On the external validation cohort, DeepRePath had an AUC of 0.76, while histopathological features, such as necrosis or atypical nuclei, were associated with a higher probability of recurrence. The same model, DeepRePath, was employed by Wu et al. for predicting the recurrence risk of lung cancer, achieving an AUC of 0.79 on a small testing cohort [73]. In the study of Wang et al., cell type classification into tumor cells, stromal cells, and lymphocytes achieved great classification accuracy [74]. TME analysis for spatial distribution estimation associated TME with overall survival (OS) and could provide valuable information about the patient's prognosis. In a similar framework, Wang et al. proposed a CNN for a 6-class classification problem to identify different cell types nuclei for estimating TME and its prognostic value [75]. The derived features from the TME analysis were indicators of OS. For instance, higher karyorrhexis density was associated with worse survival outcomes, while higher stromal nuclei density was associated with better survival outcomes. Moreover, segmentation of cell nuclei on H&E WSIs was applied to identify and quantify tumor-infiltrating lymphocytes (TILs) for prognostic value on NSCLC patients [76]. The authors highlighted the potential of their proposed model for quantifying TILs, instead of immunohistochemical staining (CD8), for assisting pathologists. Likewise, the quantitative and spatial localization characteristics of TILs and tumor cells were evaluated for OS and relapse-free survival (RFS) in NSCLC cohorts [77]. From 10 immune checkpoint proteins, galectin-9 and OX40L had the higher relative contribution to OS (33.55%) and RFS (29.02%), respectively, while the percentage of positive tumor cells and the distance between positive TILs and positive tumor cells contributed the most to predict OS. A two-step approach of a DL method was proposed by Pham et al. for detecting lung cancer lymph node metastasis [78]. The proposed approach was developed to eliminate false positive results by performing a first classification task for distinguishing reactive lymphoid follicles from lung cancer in lymph nodes. In the study of Rączkowski et al., tumor prevalence and TME composition were used as input for predicting survival and gene mutations in lung ADC cases [79]. The prediction of OS on the lung dataset was evaluated according to clinical and demographic data [80]. The proposed weakly supervised and annotation-free CNN achieved a C-index of 0.7033, and features such as TILs, necrosis, and inflamed stromal regions were identified as prognostic factors associated with poor outcomes. Estimation of lung ADC tumor cellularity for genetic tests by pathologists could be improved by DL support. Sakamoto et al. showed that tumor cellularity can be estimated with minimum deviation from the ground truth when pathologists and AI scores are combined [81]. Pathologists' estimations deviated from the ground truth by approximately 15%, implying over-or under-estimations; however, false positive results were obtained by AI when cell blocks were evaluated. Prediction of lung ADC recurrence in several predominant subtypes, including acinar and papillary carcinoma, after complete resection achieved an accuracy of 90.9% in H&E WSIs from 55 patients [82]. The density of cancer epithelium and cancer stroma lymphocytes was calculated in H&E slides from lung ADC cases to predict patients' survival [83]. Low score rates were associated with significantly superior OS and disease-free survival in patients with ADC. The authors also included RNA transcripts to determine the TILs infiltration between the high-risk and low-risk groups revealing that patients in the low-risk group had a higher proportion of CD8+ T cells, activated CD4+ memory T cells, and plasma cells versus those in the high-risk group. Slides of lung ADC immunohistochemically stained for CD3, CD8, and CD20 were used for the detection and quantification of immune cell biomarkers [84]. High sensitivity and specificity rates were recorded in discriminating T cells, considering the immunostaining intensity variables and the presence of anthracotic pigment in the tissue slides. In a recent study, a DL method was employed for predicting aneuploidy from lung ADC WSIs performing nuclei segmentation and using a single-cell analysis [85] (Table 4).

Prediction of Significant Molecular Alterations
Molecular detection of prognostic and predictive biomarkers in specific histological subtypes can predict favorable responses to targeted therapy and treatment. The detection of significant molecular alterations on immunohistochemistry (IHC) slides using DL algorithms was the scope of several studies. Concerning ALK rearrangements prediction, in the study by Terada et al., the commercially available HALO-AI platform and DenseNet were employed in IHC slides achieving a maximum AUC of 0.73 (in the resolution of 1.0 µm/pix) [89]. Another study aimed to predict mutations (EGFR, BRAF, TP53, STK11, and KRAS) based on Next Generation Sequencing (NGS) data and H&E WSIs from ADC samples with several deep neural network-based models [90]. Predicting EGFR and TP53 mutations achieved better performance compared to the remaining genes involved in the study. In the study of Wang et al., the proposed model for predicting the mutational status of 10 frequently mutated genes in ADC slides had the best performance for EGFR mutational status with an AUC of 0.824 [50]. Similarly, Coudray et al. trained the Inceptionv3 network to predict the mutational status of 10 genes in lung ADC, with STK11 and KRAS having the highest AUC of 0.845 and 0.814, respectively [56]. In addition, high performance was recorded for TP53 and EFGR biomarkers prediction, with AUC of 0.87 and 0.84, respectively, in the study of Yang et al. [87]. However, the model was not validated on an external cohort, and only 180 WSIs from the TCGA database were used. MET, FGFR1, and FGFR2 mutations were predicted with accuracies of 86.3%, 83.2%, and 82.1%, respectively [91]. The recent study by Mayer et al. was the first to employ DL for predicting ROS1 rearrangement directly from H&E WSIs [92]. ROS1 rearrangement prediction reached sensitivity and specificity of 100% and 98.48%. Moreover, the characterization of intra-tumor heterogeneity in ADC by gene expression levels was associated with patients' survival [86]. In the lung cancer dataset, the highest AUC was detected for miR-17-5p microRNA, followed by KRAS and CD274 (PD-L1). Another study determined TMB value (low or high) according to a selected threshold in lung ADC WSIs. TMB value was predicted for each area of the image, reflecting the heterogeneity of TMB [93]. No significant correlation between the TMB status and the tumor stage of the patient was noted, while the performance of the DL model was relatively low, with an AUC of 0.641. Likewise, the prediction of TMB in 50 SCC H&E images achieved an AUC of 0.65 (Table 5) [94].

Cytology
Cytological specimens from the lung are frequently the only available diagnostic material. However, by its nature, this material is limited, prohibiting auxiliary techniques for specific subtyping, such as immunocytochemistry. Only a limited number of studies have addressed the issue of utilizing cytological images for training neural networks for lung cancer diagnosis and subtyping ( Table 6). The first study for the classification of lung cancer cytological images (ADC, SCC, SCLC) achieved a classification accuracy of 71% after the data augmentation process [97]. In addition to this study, Teramoto et al. further extended their work for the classification of lung cytological images (real and synthesized) into benign and malignant with a generative adversarial network (GAN) [98]. The proposed method achieved an AUC of 0.901. Similarly, the classification of benign and malignant cells from cytological pleural effusions WSIs, by a weakly supervised model achieved an AUC of 0.9526 [99]. The model had a significantly strong correlation with the histological diagnosis gold standard as well as with senior cytopathologists' diagnosis. Misclassification was observed when poor adhesion of tumor cells or clusters of mesothelial cells were present. Diagnosis between benign and malignant cells from cytological specimens was performed in the studies of Lin and Teramoto et al., including 499 and 322 images, respectively [100,101]. Distinct morphological features (size of cells, nuclei, and nucleoli) of cytological specimens of lung cancer were recognizable by four different fine-tuned deep CNNs (DCNNs) [102]. Three out of four DL models resulted in a classification accuracy of more than 73% for lung cancer subtyping into ADC, SCC, and SCLC; however, some cases of poorly differentiated NSCLC were misclassified. Furthermore, the distinction between LCNEC and SCLC showed promising results in the study of Gonzalez et al. [103]. Three classifiers were developed with three distinct datasets of Diff-Quik ® -, Papanicolaouand H&E-stained cytological WSIs and achieved an AUC of 1, 1 and 0.875, respectively. Lastly, endobronchial ultrasound (EBUS)-guided transbronchial needle aspiration (TBNA) cytological images were employed for diagnosing mediastinal metastatic lesions [104]. The study by Wang et al. was the first to include EBUS-TBNA cytological images for automatic segmentation of enlarged mediastinal lymph nodes metastasis, outperforming three state-of-the-art baseline models.

PD-L1 Expression Status
PD-L1 is an immune checkpoint protein expressed on tumor cells and activated immune cells [105]. In NSCLC patients, assessment of PD-L1 expression is pivotal for guiding patients' treatment selection with immune checkpoint inhibitors (ICIs). IHC is the currently accepted diagnostic assay performed on formalin-fixed paraffin-embedded (FFPE) lung tissue or cytological specimens [106]. There are different platforms for IHC interpretation, PD-L1 antibodies, guidelines for evaluation and scoring, as well as positivity cut-offs for immunotherapy selection. Currently, four IHC assays (28-8 and 22C3 from DAKO, SP263 and SP142 from Ventana) have been approved for use by the Food and Drug Administration (FDA). The 22C3 and 28-8 pharmDx (DAKO) IHC assays are companion diagnostics for selecting patients for pembrolizumab and nivolumab, respectively [107,108]. SP142 and SP263 (Ventana) IHC assays are also FDA-approved for companion diagnostic to atezolizumab. Evaluation of PD-L1 expression with the 22C3 and 28-8 pharmDx, as well as SP263 (Ventana) assays, only refers to the PD-L1 expression on tumor cells, while, on the other hand, the SP142 (Ventana) assay refers to tumor and immune cells staining [109]. As PD-L1 scoring algorithms determine the therapeutic choice and interobserver discordance is common, it is conceivable that quantitative validation of PD-L1 expression by DL algorithms may assist pathologists in their assessment. In a recent study, Hondelink et al. developed a fully supervised DL model for PD-L1 TPS assessment in NSCLC WSIs according to three cut-off points (<1%, 1-50%, and 50-100%) [110]. TPS prediction was in concordance with the mean score of three pathologists in 79% of the cases. Misclassification of some cases was noted when positive PD-L1 immune cells were present around the tumor site, the intensity of PD-L1 positive neoplastic cells was weak, or when non-membranous staining was detected. In a similar framework, Liu et al. performed tumor region segmentation and nuclei detection for PD-L1 TPS prediction on SCC WSIs according to three cut-off points (<1%, 1-49%, and ≥50%) [111]. Their proposed model's predictions were compared to the pathologist's prediction with different experience levels. The model's classification accuracy was 74.51%, higher than trainees (71.55%) but lower than subspecialist and non-subspecialist pathologists (97.06% and 84.03%, respectively). In another study, TPS assessment reached high performance in terms of sensitivity and specificity in both 1% and 50% cut-off points [112]. The classification was performed on slides stained with 22C3 antibody, and the proposed patch-based dual-scale categorization method based on VGG16 architecture achieved higher performance compared to VGG16. The study of Sha et al. resulted in an AUC of 0.80 on a balanced testing cohort in classifying positive and negative PD-L1 tumor cells [113]. In SCC-separated cases, the model achieved a lower AUC compared to ADC cases (0.64 and 0.83, respectively), maybe due to an imbalance in the training cohort. In the studies of Kapil et al., TPS was estimated by dividing the pixel number of positive tumor cells by the total pixel number of positive and negative tumor cells [114,115]. Of all the included studies estimating PD-L1 TPS, these two were the only ones using slides stained with SP263 antibody with the cut-off point defined at 25%. In their first study, fully-and semi-supervised network architectures were used for estimating TPS in NSCLC specimens, with results agreeing with pathologists' evaluation, while, in their subsequent study, TPS estimation was performed with a GAN. Two classification problems were addressed, namely, a binary task for epithelial and non-epithelial region segmentation as well as TPS estimation. An additional dataset of WSIs stained with the epithelial marker Pan-Cytokeratin was used for the binary segmentation task of the epithelial benign and malignant regions. In the study of Wu et al., PD-L1 IHC slides stained with 22C3 assay were used for training U-Net to perform tumor area detection and TPS calculation [116]. The model was highly consistent with trained pathologists and achieved high performance when further tested in SP263 (Ventana) stained slides (accuracy of 0.9326 and 0.9624 for 22C3-and SP263-stained slides, respectively). Furthermore, the authors demonstrated that the AI-based model could help untrained pathologists with TPS assessment by reducing the time of microscopic examination. In the same framework, three automated workflows based on DL, including both 22C3 (DAKO) and SP263 (Ventana) IHC assays, and two cut-off points (<1%, ≥50%), achieved better performance in the <1% cut-off point [117]. The model by Choi et al. achieved an area under the receiver operating characteristic (AUROC) of 0.889 in detecting PD-L1 positive and negative tumor cells and estimating TPS value, while it significantly increased the concordance of pathologists after a disagreement (initial/baseline concordance of 81.4% versus revised concordance of 90.2%) [118]. Aitrox's AI performance for PD-L1 expression by Huang et al. was comparable to those of experienced pathologists, while it surpassed inexperienced ones (Table 7) [119]. Table 7. Characteristics of studies developing models for the assessment of programmed cell death ligand 1 expression in lung cancer using histological data.

Deep Learning Approaches
From a clinical point of view, the main challenges in Digital Pathology are (i) the extremely large size of the images produced by whole-slide scanning and the requirement for pathologists to evaluate the entire specimen; (ii) the digitization of annotated findings of interest, which is a very demanding and time-consuming process. The latter, combined with the fact that DL techniques require a large amount of training data, intensifies the problem of the provision of reliable results. Many studies presented in the literature attempt to overcome the lack of annotations by using weakly supervised or semi-supervised learning techniques instead of fully supervised approaches. These approaches interact with known CNN architectures to classify patches of images or to detect tissue alterations and/or morphological features of cancer. Weakly supervised learning is a branch of machine learning (ML) that aims to use less or lower quality labels for training predictive models. It works by leveraging the unlabeled data or refining the labels to improve the model performance. In terms of Digital Pathology, weakly supervised methods use a small number of annotations by selecting informative patches to classify the WSIs [25]. General approaches of weakly supervised learning in histopathological images have been proposed, employing VGG-16 [25], EM-CNN [52], EfficientNet-B3 [20], and ResNet [80]. Furthermore, most of the presented studies in this category employ MIL [34,85], which is a weakly supervised learning technique that groups data points into bags. Each bag is labeled with the class by the instance count of that particular class. This technique is well-suited for histology slide classification because it is designed to operate on weakly-labeled images [65]. For example, clustering-constrained-attention MIL (CLAM), developed by Lu et al. [47], is a weakly supervised method that uses attention-based learning to automatically identify subregions of high diagnostic value and, thus, accurately classify the whole slide. Other works combine the MIL approaches with well-known architectures of CNNs, such as ResNet [48,65,95], EfficientNetB1 [22], and SimCLR [59]. Moreover, Teramoto et al. [101] compared several CNNs as backbones (LeNet, AlexNet, ResNet, Inception, DenseNet) using MIL and an attention mechanism, while Hou et al. [57] presented 14 different combinations of expectation maximization (EM)-based MIL approach with Logistic Regression and Support Vector Machine (SVM). Finally, another work that attempted to overcome the lack of labeled data employs a semi-supervised approach inspired by YOLOv5 for the detection of micropapillary lung ADC. This method implements a teacher model, which is directly trained by the ground truth data, and a student model, which indirectly learns from the teacher model [68].
From a technical perspective, the extremely large size of images and the complexity of classification or detection problems in these images as well generate a very demanding process in terms of computational resources and training time of supervision. Typically, researchers can follow two main different approaches: (i) to develop a custom architecture, implementing all the components of both convolutional and fully connected layers and defining all the super parameters of the network or (ii) to use already pre-trained architectures and take advantage of transfer learning from other datasets (i.e., IMAGENET). Custom architectures can be more accurate than pre-trained CNNs with transfer learning if they are designed well for a specific problem and trained on an adequate set of images. However, they require more time and resources to develop and train. For these reasons, custom CNNs are mostly less deep than the pre-trained models to overcome the limitations of the demanding implementation and the computational requirements. Thus, most of the presented custom architectures for lung cancer consist of up to three convolution layers as well as up to three fully connected layers [21,49,61,74,82,97]. One of these works utilizes two different color spaces developing two same feature extractors, one for RGB and one based on HLS [82]. More extended architectures schemas have also been presented, developing six convolutions and two dense layers [84] or more than five convolution layers along with one devolution for upscaling [40,104]. Finally, the deeper CNN in this category, called Deep Hipo, operates on both magnifications (20× and 5×), and it is based on CAT-NET developing 19 layers in total [28].
More sophisticated DL methods have been proposed, either modifying known architectures of CNNs or combining two differing CNN architectures and CNN architectures with classic ML techniques. Most of the modified architectures are based on ResNet. DeepRePath [72] is a novel CNN model based on ResNet-50 that operates on different magnifications building two CNNs, while a similar approach proposed by Sha et al. [113] developed two branches for the processing of small and large field-of-view features of PD-L1 classes. On the other hand, SE-ResNet-50 [38] focuses on the improvement of the activation function introducing CroRELU. Other novel modifications of known architectures are the KimiaNet22 based on DenseNet [44], the MR-EM-CNN, which extracts hierarchical multiscale features on an EM-CNN model [43], the DSC-VGG16, which provides a dual scale categorization of PD-L1 classes based on VGG16 [112], the WIFPS model [31] based on EfficientNet-B5, and the novel architecture proposed by Gonzales et al. [103], which utilizes three different stains. Finally, Rączkowski et al. [79] developed a novel architecture called ARA-CNN, which is inspired by both ResNet and DarkNet models.
By combining different CNN models or CNNs with classic ML techniques, researchers attempt to provide better performance in several categories of lung cancer problems. Combinations of different CNN models presented in the literature are (i) ResNet-50 with U-Net [111], (ii) EfficientNet with U-Net [77], and (iii) DeepLadV3 with Incepetion-ResNetV2 [71]. By combining DL and ML approaches, Wang et al. [42] introduced the LungDIG architecture, which employs an Inception-V3 model along with a classic multilayer perceptron. Two other approaches extract deep features utilizing the convolution layers of CNNs and then provide predictions using logistic regression [19,62]. SVM has also been used in cooperation with CNN models. Perez et al. [46] merged information from ResNet-18 from the processing of WSIs along with SVM from RNA-sequencing data, while Togaçar et al. [45] and Hu et al. [88] combined SVM with DarkNet and Xception models, respectively. Finally, principal component analysis (PCA) techniques have been used along with CNNs architectures for dimensionality reduction of the extracted features [18,88]. The contribution of DL in lung cancer presents several other methods that employ Graph-based CNNs, GANs, and autoencoders. Graph CNNs have been used to identify regions or cell structural features that are highly associated with the class label. In this category, three approaches have been proposed, where Graph-based modules are combined with AlexNet [54], VGG16 [66], and ResNet. [37]. GANs are mostly used to generate informative synthetic sets of images in order to increase the training set and, thus, avoid overtraining issues. DASGAN, which is an extension of the CycleGAN architecture, has been introduced [114], merging two stains and leading deep survival learning methodology.
Teramoto et al. introduced a progressive growing approach of GANs (PGGAN) combined with the VGG-16 model [98], while Mayer et al. [92] combined GANs with semi-supervised learning. Another auxiliary classifier GANs (AC-GANs) approach has been proposed by Kapil et al. [115] to generate classifier models and detect ALK and ROS1 fusions directly from H&E images. Finally, an unsupervised DL model that employs stacked autoencoders has been developed by Sheikh et al. [67].

Discussion
DL is progressively embraced in Pathology, especially for breast, colorectal, prostate, and lung cancer diagnosis, transforming the current landscape of medicine [120][121][122][123][124][125][126][127][128][129][130][131]. AI could play a pivotal role in the multidisciplinary approach to diagnosis and patient management. As already underlined above, in lung cancer, classification, accurate diagnosis and subtyping depend on distinct morphological features among cancer cells combined with staining patterns, tumor biological characteristics, and molecular data of mutations. Lung cancer histology is characterized by cellular heterogeneity, challenging the diagnostic process [132]. Several histological features can be defined by examining a single H&E-stained slide, such as glandular differentiation in lung ADC, the presence of keratinization and intercellular bridges in SCC, as well as scant cytoplasm and poorly defined cell borders in SCLC. However, for differential diagnosis, special immunohistochemical staining is required for accuracy. According to the WHO guidelines, the terminology for lung cancer classification in small biopsies or cytology and resection specimens must follow the proposed criteria [2]. For example, in resection specimens, lung ADC cases must be morphologically determined by the predominant histological pattern (lepidic, acinar, papillary, micropapillary, solid). The distinction of lung neuroendocrine tumors (NETs) directly from the H&E slide can also be challenging, whereas NETs are further classified as typical carcinoids, atypical carcinoids, SCLC, and LCNEC. Given that small biopsies and cytology specimens are encountered for diagnosis in about 70% of the patients, the available diagnostic material is often limited and thus, every effort should be employed to preserve sufficient material for molecular analysis. Therefore, it is strongly recommended to use only a limited panel of biomarkers, including the most representative ones for immunostaining for differential diagnosis. However, this approach can hamper accurate diagnosis. Here, AI could be of great help to the pathologist by guiding with high accuracy the prevailing diagnosis from an H&E-stained slide.
Data extraction of our systematic review demonstrated that DL-based methodologies for lung cancer diagnosis are mainly performed on histological H&E WSIs, with ADC versus SCC being the predominant classification task, as shown in Table 2. All the studies were performed with high classification accuracy for identifying ADC and SCC. Secondly, many studies utilized different CNN architectures for classifying ADC, SCC, and SCLC in small biopsies. The higher performance was in the study of Kanavati et al. [24] (AUC of 0.94-0.99), which included a large number of images. Only two studies designed a classification task for identifying ADC, SCC, SCLC, and LCNEC on WSIs [27,28]. This 4-class task represents the realistic daily practice of a pathologist. In both studies, the AUC was over 0.90, encouraging the fact that such DL models could be employed and of great value in a pathology laboratory. The third most common approach in histological slides was the employment of DL-based models for lung ADC histological subtyping. The studies of Sheikh [67] and DiPalma et al. [65] achieved the highest classification accuracy performing a 5-class problem (lepidic, acinar, papillary, micropapillary, solid). Albeit limited in number, eight noteworthy studies utilized cytological slides for lung cancer diagnosis or classification. Four of them performed a binary classification task for benign and malignant cell detection [98][99][100][101]. All studies showed good classification accuracy; however, compared to the classification problems performed on histological data, the dataset was limited in the majority of the studies. In addition, in the cytology section, the most common classification task for lung cancer (ADC, SCC, and SCLC) resulted in modest classification accuracies, including state-of-the-art architectures (66-77% and~71%), with the main limitation being the small number of images included for training (55 and 76 cytological slides, respectively) [97,102]. Prediction of OS and risk of recurrence as well as identification of prognostic features, were also the aim of many research papers, in which the predicted output emerged after nuclei segmentation, TILs quantification, identification of gene expression, or clinical data. The highest AUC (0.917) for ALK rearrangements prediction was in the study by Chen et al. [31], while EGFR mutations were predicted with an AUC of 0.824, 0.84, and 0.83 in the studies by Wang, Yang, and Coudray et al., respectively [31,50,56,87]. In the most recent study by Pao et al., the prediction of EGFR mutational status in 2099 lung ADC tissue specimens reached an AUC of 0.87 [95]. As far as PD-L1 quantification is concerned, the majority of studies included datasets consisting of WSIs stained with the 22C3 antibody. The remaining studies included slides stained with the SP263 antibody or a combination of 22C3 and SP263 antibodies. For quantitative problems, such as TPS estimation for PD-L1 expression, labeling ground truth must be as consistent as possible to avoid misclassification concerning the specific cut-off points for PD-L1 evaluation DL-based models for PD-L1 TPS estimation offer several advantages to pathologists as TPS quantification is a time-consuming process prone to subjective estimation. Despite the extensive research and progress on histological images, further research on cytological material, including a larger dataset, is considered essential for optimizing classification performance.
According to the technical point of view, summarizing the methods presented in the literature, most of them (78 studies) developed supervised learning methodologies, basically dealing with classification problems of the medical question. Specifically, 11 studies implemented custom CNN architectures, 36 studies employed known models with or without transfer learning, 11 studies modified known architectures, and, finally, 14 studies combined CNNs either with each other or with ML techniques. Apart from the above crisp categories of supervised learning, the category named "other methods" contained six supervised, one weakly supervised, and one unsupervised method (eight studies in total). Weakly supervised methods are 13 in total, while there are one semi-supervised and one unsupervised method (Figure 2). mutational status in 2099 lung ADC tissue specimens reached an AUC of 0.87 [95]. As far as PD-L1 quantification is concerned, the majority of studies included datasets consisting of WSIs stained with the 22C3 antibody. The remaining studies included slides stained with the SP263 antibody or a combination of 22C3 and SP263 antibodies. For quantitative problems, such as TPS estimation for PD-L1 expression, labeling ground truth must be as consistent as possible to avoid misclassification concerning the specific cut-off points for PD-L1 evaluation DL-based models for PD-L1 TPS estimation offer several advantages to pathologists as TPS quantification is a time-consuming process prone to subjective estimation. Despite the extensive research and progress on histological images, further research on cytological material, including a larger dataset, is considered essential for optimizing classification performance. According to the technical point of view, summarizing the methods presented in the literature, most of them (78 studies) developed supervised learning methodologies, basically dealing with classification problems of the medical question. Specifically, 11 studies implemented custom CNN architectures, 36 studies employed known models with or without transfer learning, 11 studies modified known architectures, and, finally, 14 studies combined CNNs either with each other or with ML techniques. Apart from the above crisp categories of supervised learning, the category named "other methods" contained six supervised, one weakly supervised, and one unsupervised method (eight studies in total). Weakly supervised methods are 13 in total, while there are one semi-supervised and one unsupervised method (Figure 2). To conclude about the most commonly used known architectures, the employed architectures have been counted for each study, and the results are presented in Figure 3. Note that several studies have not used known architectures (for example the studies that develop custom CNN architectures), while several studies employ more than one. To conclude about the most commonly used known architectures, the employed architectures have been counted for each study, and the results are presented in Figure 3.
Note that several studies have not used known architectures (for example the studies that develop custom CNN architectures), while several studies employ more than one. Our review shows that many of the employed DL methods in lung cancer are particularly extensive and sophisticated, as well as scalarly evolving into new techniques following the development of AI. According to the comparative studies presented in this review, DL methods overall outperform traditional ML techniques. This superiority of DL could partially be explained by the quality of the features feeding the fully connected layers. The features in CNNs are not selected subjectively by the specialists but are automatically extracted from the convolutional layers, maximizing the carried information.
Comparing the reviewed architectures, it is evident from the results of the review that ResNet-based and Inception-based architectures have been used in about half of the methods presented in the literature, showing high performances compared to other architectures. The existence of residual blocks in most of these architectures (all ResNet and InceptionV4 models) seems to operate efficiently and effectively in biopsy image processing. Jumping features directly from a convolutional layer to many subsequent layers operates like merging features from different digital magnifications of scanning. Such a procedure seems to make sense for biopsy imaging, where different magnifications of scanning provide different knowledge about the microenvironment of the cells.
It is also meaningful to summarize the limitations of the DL techniques in lung cancer. Table 8 emphasizes several limitations of the application of the proposed DL methodologies in lung cancer diagnosis that we were able to identify based on our systematic review. Some of them are generally well-known constraints, while some others are related to the imaging problem of lung biopsies. According to the review, only a few approaches focus on performing tasks that require common sense reasoning, such as understanding the physical characteristics of the cells. More explainable artificial intelligence approaches could be proposed in the future. Training limitations with inadequate samples Deep learning algorithms require massive amounts of labeled data to achieve good performance, and thus, thousands of annotations must be performed by pathologists. Less powerful in problems beyond classification Deep learning algorithms are mainly designed for classification problems, such as image recognition and natural language processing. They are less effective for other types of problems, such as regression, clustering, etc. Our review shows that many of the employed DL methods in lung cancer are particularly extensive and sophisticated, as well as scalarly evolving into new techniques following the development of AI. According to the comparative studies presented in this review, DL methods overall outperform traditional ML techniques. This superiority of DL could partially be explained by the quality of the features feeding the fully connected layers. The features in CNNs are not selected subjectively by the specialists but are automatically extracted from the convolutional layers, maximizing the carried information.
Comparing the reviewed architectures, it is evident from the results of the review that ResNet-based and Inception-based architectures have been used in about half of the methods presented in the literature, showing high performances compared to other architectures. The existence of residual blocks in most of these architectures (all ResNet and InceptionV4 models) seems to operate efficiently and effectively in biopsy image processing. Jumping features directly from a convolutional layer to many subsequent layers operates like merging features from different digital magnifications of scanning. Such a procedure seems to make sense for biopsy imaging, where different magnifications of scanning provide different knowledge about the microenvironment of the cells.
It is also meaningful to summarize the limitations of the DL techniques in lung cancer. Table 8 emphasizes several limitations of the application of the proposed DL methodologies in lung cancer diagnosis that we were able to identify based on our systematic review. Some of them are generally well-known constraints, while some others are related to the imaging problem of lung biopsies.
Our findings demonstrate that the field of Digital Pathology for lung cancer diagnosis has evolved rapidly in the last 5 years. However, at least for most laboratories, the use of these capabilities in daily clinical practice is still in its early stages. Adopting a fully digital workflow can be challenging, and limitations must be overcome for implementation in the clinical setting. Digital slide generation is the first step in moving from traditional to Digital Pathology. WSI scanners provide high-quality images of histological and cytological slides. These images can be uploaded and remotely reviewed by pathologists and cytologists on a computer, while they can be available for review by multiple pathologists. However, the organization and storage of large amounts of digitized data require high computing power, storage space, technical infrastructure, and backup capability. Furthermore, as a consequence of digitized data, ethical issues are arising concerning the sharing of sensitive personal data. DL models require large amounts of data for training, testing, and validation, which are retrieved from hospital archives. Therefore, a regulatory framework is essential to protect patient's rights and ensure the security of sensitive medical data and confidentiality.

Lack of interpretability and explainability
According to the review, only a few approaches focus on performing tasks that require common sense reasoning, such as understanding the physical characteristics of the cells. More explainable artificial intelligence approaches could be proposed in the future.
Training limitations with inadequate samples Deep learning algorithms require massive amounts of labeled data to achieve good performance, and thus, thousands of annotations must be performed by pathologists.
Less powerful in problems beyond classification Deep learning algorithms are mainly designed for classification problems, such as image recognition and natural language processing. They are less effective for other types of problems, such as regression, clustering, etc.
Lack of global generalization Deep learning algorithms often overfit the training data and fail to generalize to new or unlabeled data. For example, a deep learning model may perform well on images from a specific microscopic scanner but poorly on images from a different microscope.

High memory and computational cost requirements
The training of deep models using extremely large size of images, such as biopsies, constitutes a very demanding process in terms of computational resources and training time of the supervision.

Conclusions
The field of Digital Pathology is evolving rapidly and, in the following years, is expected to be an inextricable part of a pathology laboratory. As highlighted above, AI-based approaches in Pathology are accompanied by several advantages, yet many challenges remain to be considered. Research for lung cancer diagnosis, prognosis, and prediction using DL methods is constantly improving to provide more accurate and reliable results. Moreover, for quantitative tasks, such as PD-L1 TPS estimation, the need for AI-based models is underlined because of their ability to provide reliable and objective assessment, eliminating subjective estimations that lead to intra-and inter-observer variability. The ongoing research and the efforts being made are at the forefront of transforming cancer diagnosis and treatment.  Data Availability Statement: All data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.